Speech recognition driven system with selectable speech models

ABSTRACT

A speech recognition driven system provides a speech model based on a biometric signature. Initially, the speech recognition driven system receives a biometric signature from a user of the system. Based upon the received biometric signature, the system selects a speech model. The selected speech model is utilized to determine whether a voice input provided by the user corresponds to a speech selectable task that is recognized by the speech recognition driven system. When the voice input provided by the user corresponds to the speech selectable task, the system causes the speech selectable task to be performed. In one embodiment, the biometric signature is an image of the user&#39;s face. When face recognition technology is implemented, the image of the user&#39;s face is utilized to select a speech model.

TECHNICAL FIELD

The present invention is directed to speech recognition, and morespecifically to a speech recognition driven system with selectablespeech models.

BACKGROUND OF THE INVENTION

A number of biometric signatures have been utilized to identify aparticular individual. For example, fingerprint, retina, iris, face andvoice recognition technologies have utilized pattern recognitiontechniques to uniquely identify a particular individual. Face and voicerecognition systems are particularly attractive as they are normallyunobtrusive and are passive (i.e., they do not require electromagneticillumination of the subject of interest). A number of face recognitionsystems are currently available (e.g., products are offered byVisionics, Viisage and Miros). Further, some vendors offer products thatutilize multiple biometric signatures to uniquely identify a particularindividual. For example, Dialog Communication Systems (DCS AG) hasdeveloped BioID™ (a multimodal identification system that uses face,voice and lip movement to uniquely identify an individual).

As is well known to one of ordinary skill in the art, speech recognitionis a field in computer science that deals with designing computersystems that can recognize spoken words. A number of speech recognitionsystems are currently available (e.g., products are offered by IBM,Dragon Systems, Learnout & Hauspie and Philips). Most of these systemsmodify a speech model, based on a user's input, to enhance accuracy ofthe system. Traditionally, speech recognition systems have only beenused in a few specialized situations due to their cost and limitedfunctionality. For example, such systems have been implemented when auser is unable to use a keyboard to enter data because the user's handswere disabled. Instead of typing commands, the user spoke into amicrophone.

However, as the costs of these systems has continued to decrease and theperformance of these systems has continued to increase, speechrecognition systems are being used in a wider variety of applications(as an alternative to keyboards or other user interfaces). For example,speech actuated control systems have been implemented in motor vehiclesto control various accessories within the motor vehicles.

A typical speech recognition system, that is implemented in a motorvehicle, includes voice processing circuitry and memory for storing datathat represents command words (that are employed to control variousvehicle accessories). In a typical system, a microprocessor is utilizedto compare the user provided data (i.e., voice input) to stored speechmodels; to determine if a word match has occurred and provide acorresponding control output signal in such an event. The microprocessorhas also normally controlled a plurality of motor vehicle accessories,e.g., a cellular telephone and a radio. Such systems have advantageouslyallowed a driver of the motor vehicle to maintain vigilance whiledriving the vehicle.

Acceptance of speech recognition as a primary interface for anymulti-user system (e.g., an automobile), is dependent upon therecognition accuracy of the system. As mentioned above, a method forincreasing speech recognition accuracy has been to implement systems,which adapt to a speaker. This has entailed storing a continuouslyupdated version of a speech model for each word or subword in a givenvocabulary. In this manner, the system adjusts to the speaking patternof a given individual, thus increasing the probability for correctrecognition. Unfortunately, such systems generally cannot be utilized bymultiple users (unless the multiple users have nearly identical speechpatterns).

As such, a system that provides multiple adaptable user specific speechmodels is desirable.

SUMMARY OF THE INVENTION

The present invention is directed to a method and system that provides aspeech model based on a biometric signature. Initially, the speechrecognition driven system receives a biometric signature from the userof the system. Based upon the received biometric signature, the systemselects a speech model. The selected speech model is utilized todetermine whether a voice input, provided by the user, corresponds to aspeech selectable task that is recognized by the speech recognitiondriven system. When the voice input corresponds to the speech selectabletask, the system causes the speech selectable task to be performed. Inone embodiment, the biometric signature is an image of the user's face.When face recognition technology is implemented, the image of the user'sface is utilized to select a speech model. In another embodiment, thesystem uses a default speech model when the system fails to recognizethe biometric signature. In yet another embodiment, the system creates anew speech model when the system fails to recognize the biometricsignature. In a different embodiment, the selected speech model isupdated such that the system adapts to the speech pattern of the user.An advantage of the present invention is that when an individualizedspeech model is selected, the error rate of the speech recognitiondriven system is generally reduced.

These and other features, advantages and objects of the presentinvention will be further understood and appreciated by those skilled inthe art by reference to the following specification, claims and appendeddrawings.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention will now be described, by way of example, withreference to the accompanying drawings, in which:

FIG. 1 is a block diagram of a speech recognition driven systemimplemented in a motor vehicle, according to an embodiment of thepresent invention; and

FIGS. 2A-2B are a flow diagram of a routine for a speech recognitiondriven system that selects a speech model based on a received biometricsignature, according to an embodiment of the present invention.

DESCRIPTION OF THE PREFERRED EMBODIMENT

A speech recognition driven system, according to an embodiment of thepresent invention, selects an appropriate speech model based on areceived biometric signature. When an individualized speech model isselected, the error rate of the speech recognition driven system isgenerally reduced. However, a default speech model may be utilized whena user (e.g., the driver) is not recognized or when the system cannotaccept a new user. One of ordinary skill in the art will appreciatethat, in this situation, the error rate is not reduced. A speechrecognition driven system utilizing face recognition technology can beimplemented without additional hardware in environments that alreadyinclude a camera. Additionally, when implemented within an automobile,face recognition allows for the personalization of multiple automotivesettings. For example, seat settings, mirror settings, radio pre-setsand multimedia functions (such as address books, phone lists, Internetbookmarks and other features) can be initiated with face recognitiontechnology. Additionally, face recognition technology can provideadditional security for a vehicle by controlling the operation of thevehicle (e.g., only allowing the vehicle to be placed into gear if theface of the driver is recognized).

Referring to FIG. 1, a block diagram of a speech recognition drivensystem 100 (implemented within a motor vehicle) that utilizes facerecognition technology, according to an embodiment of the presentinvention, is depicted. System 100 includes a processor 102 coupled to amotor vehicle accessory 124 and a display 120. Processor 102 controlsmotor vehicle accessory 124, at least in part, as dictated by voiceinput supplied by a user of system 100. Processor 102 also suppliesvarious information to display 120 to allow a user of the motor vehicleto better utilize system 100. In this context, the term processor mayinclude a general-purpose processor, a microcontroller (i.e., anexecution unit with memory, etc., integrated within a single integratedcircuit) or a digital signal processor.

Processor 102 is also coupled to a memory subsystem 104. Memorysubsystem 104 includes an application appropriate amount of main memory(volatile and non-volatile). An audio input device 118 (e.g., amicrophone) is coupled to a filter/amplifier module 116.Filter/amplifier module 116 filters and amplifies the voice inputprovided by a user (through audio input device 118). Filter/amplifiermodule 116 is also coupled to an analog-to-digital (A/D) converter 114.A/D converter 114 digitizes the voice input from the user and suppliesthe digitized voice to processor 102 (which causes the voice input to becompared to system recognized commands). Processor 102 executes acommercially available routine to determine whether the voice inputcorresponds to a system recognized command.

Processor 102 may also cause an appropriate voice output to be providedto the user, ultimately through an audio output device 112. Thesynthesized voice output is provided by processor 102 to adigital-to-analog (D/A) converter 108. D/A converter 108 is coupled to afilter/amplifier module 110, which amplifies and filters an analog voiceoutput. The amplified and filtered voice output is then provided toaudio output device 112 (e.g., a speaker). While only one motor vehicleaccessory module is shown, it is contemplated that any number ofaccessories typically provided in a motor vehicle (e.g., a cellulartelephone or radio), can be implemented.

In a preferred embodiment, a biometric signature is provided toprocessor 102 by biometric signature device 122. In a preferredembodiment, device 122 is a digital camera that utilizes a chargecoupled device (CCD). As is well known to one of ordinary skill in theart, a CCD includes an array of light sensitive elements (i.e.,capacitors). The capacitors are charged by electrons generated by thelight (i.e., photons) that reaches a given capacitor of the CCD array.In a preferred embodiment, the output of the CCD array is provided as aserial output (e.g., on a universal serial bus (USB)) to processor 102.The image derived from the CCD array is compared with stored images (orstored as a new image) and allows a stored speech model to be selected(or a new speech model to be created), based upon recognition of aspecific user. One of ordinary skill in the art will appreciate that,device 122 can be an apparatus for receiving other user biometrics(e.g., fingerprints, retina and iris).

FIGS. 2A-2B are a flowchart of a face recognition routine 200 that isactive when the automobile is running, according to an embodiment of thepresent invention. In step 202, routine 200 is initiated. Next, in step204, a digital image of the driver's face is captured. As previouslydiscussed, this occurs under the control of processor 102. Whenimplemented as a digital camera, device 122 (under control processor102) captures and transfers a digital image of the face of the driver ofthe vehicle to processor 102. Next, in step 206, processor 102(executing commercially available face recognition software) comparesthe captured image to stored images of known drivers. Then, in step 208,processor 102 determines whether the captured image corresponds to astored image. If so, control transfers from step 208 to step 216.Otherwise, control transfers from step 208 to step 210.

In step 210, processor 102 determines whether the new image is to bestored. If so, control transfers to step 212 where processor 102 causesa new speech model to be associated with the new image. From step 212,control transfers to step 218. In step 210, if processor 102 determinesthat the new image will not be stored, control transfers from step 210to step 214. In step 214, processor 102 causes a default speech model tobe loaded. From step 214, control transfers to step 218.

In step 208, if processor 102 determines that the captured imagecorresponds to a stored image, control transfers to step 216. Next, instep 216, processor 102 retrieves a stored speech model that correspondsto the captured image. Then, in step 218, processor 102 activates thespeech recognition feature. Next, in step 220, if speech is detected,control transfers to step 230. Otherwise, control loops on step 220until speech is detected (while routine 200 is active). In step 230,processor 102 determines whether the speech is recognized. If so,control transfers from step 230 to step 234. Otherwise, controltransfers from step 230 to step 232 where processor 102 causes a prompt(e.g., voice or visual), such as “the detected speech is unrecognized,please repeat the command”, to be provided to the user. From step 232,control transfers to step 220. In step 234, processor 102 causes thecommand that is associated with the recognized speech to be performed(e.g., changing the channel of an automotive radio receiver in responseto the command “FM, 101.1). From step 234, control transfers to step236.

In step 236, processor 102 determines whether the default speech modelis being utilized. If so, control transfers from step 236 to step 240.Otherwise, control transfers to step 238 where processor 102 causes auser specific speech model to be updated. From step 238, controltransfers to step 240 where routine 200 terminates. Thus, a facerecognition routine 200 has been described that allows a speechrecognition driven system to determine which specific user is utilizingthe vehicle at a given time. Based upon the user, a new speech model iscreated, a stored speech model is updated or the default speech model isused.

In a preferred embodiment, device 122 is a camera that is focused on thedriver's face (preferably mounted in a vehicle's windshield molding). Acamera, so implemented, can also be used for drowsy-driver detection andpoint-of-gaze based control systems. Utilizing a camera in this manneris desirable in that the face recognition aspect of the speechrecognition driven system can perform multiple functions. As discussedabove, other biometric signatures (e.g., fingerprint, retina, iris) canbe utilized to select a particular speech model. Face recognition basedselection of speech models is generally preferred to the use ofindividualized key fobs (for each specific driver of a givenautomobile); as the key fobs can be accidentally switched amongstvarious drivers of the automobile, at which point the key fobs cannot beused to identify a specific driver.

The above description is considered that of the preferred embodimentsonly. Modifications of the invention will occur to those skilled in theart and to those who make or use the invention. Therefore, it isunderstood that the embodiments shown in the drawings and describedabove are merely for illustrative purposes and not intended to limit thescope of the invention, which is defined by the following claims asinterpreted according to the principles of patent law, including theDoctrine of Equivalents.

What is claimed is:
 1. A method for providing a speech model based on abiometric signature in a speech recognition driven system, comprisingthe steps of: receiving a biometric signature from a user of the system;selecting a speech model based on the received biometric signature;utilizing the selected speech model to determine whether a voice inputprovided by the user corresponds to a speech selectable task that isrecognized by the speech recognition driven system; and performing aspeech selectable task when the voice input provided by the usercorresponds to a speech selectable task.
 2. The method of claim 1,wherein the biometric signature is an image of the user's face.
 3. Themethod of claim 1, wherein the system utilizes a default speech modelwhen the system fails to recognize the biometric signature.
 4. Themethod of claim 1, wherein the system creates a new speech model whenthe system fails to recognize the biometric signature.
 5. The method ofclaim 1, further including the step of: updating the selected speechmodel such that the system adapts to the speech pattern of the user. 6.The method of claim 1, further including the step of: prompting the userto provide another voice input when the voice input is not recognized.7. The method of claim 1, wherein the speech selectable task isperformed by a motor vehicle accessory.
 8. A speech recognition drivensystem that utilizes selectable speech models, comprising: a memorysubsystem for storing information; a processor coupled to the memorysubsystem; an audio input device coupled to the processor, the inputdevice receiving a voice input from a user; and speech recognition codefor causing the processor to perform the steps of: receiving a biometricsignature from the user of the system; selecting a speech model based onthe received biometric signature; utilizing the selected speech model todetermine whether the voice input provided by the user corresponds to aspeech selectable task that is recognized by the speech recognitiondriven system; and performing a speech selectable task when the voiceinput provided by the user corresponds to a speech selectable task. 9.The system of claim 8, wherein the biometric signature is an image ofthe user's face.
 10. The system of claim 8, wherein the system utilizesa default speech model when the system fails to recognize the biometricsignature.
 11. The system of claim 8, wherein the system creates a newspeech model when the system fails to recognize the biometric signature.12. The system of claim 8, wherein the speech recognition code causesthe processor to perform the additional steps of: updating the selectedspeech model such that the system adapts to the speech pattern of theuser.
 13. The system of claim 8, wherein the speech recognition codecauses the processor to perform the additional steps of: prompting theuser to provide another voice input when the voice input is notrecognized.
 14. The system of claim 8, further including: an audiooutput device coupled to the processor, the output device providingvoice feedback to the user.
 15. The system of claim 14, wherein theaudio output device is a speaker.
 16. The system of claim 8, wherein theaudio input device is a microphone.
 17. The system of claim 8, whereinthe speech selectable task is performed by a motor vehicle accessory.18. A multi-level speech recognition driven system for controlling motorvehicle accessories that utilizes selectable speech models, comprising:a memory subsystem for storing information; a processor coupled to thememory subsystem; a motor vehicle accessory coupled to the processor; anaudio input device coupled to the processor, the input device receivinga voice input from a user; and speech recognition code for causing theprocessor to perform the steps of: receiving a biometric signature fromthe user of the system; selecting a speech model based on the receivedbiometric signature; utilizing the selected speech model to determinewhether the voice input provided by the user corresponds to a speechselectable task that is recognized by the speech recognition drivensystem; and controlling the motor vehicle accessory according to aspeech selectable task when the voice input provided by the usercorresponds to a speech selectable task.
 19. The system of claim 18,wherein the biometric signature is an image of the user's face.
 20. Thesystem of claim 18, wherein the system utilizes a default speech modelwhen the system fails to recognize the biometric signature.
 21. Thesystem of claim 18, wherein the system creates a new speech model whenthe system fails to recognize the biometric signature.
 22. The system ofclaim 18, wherein the speech recognition code causes the processor toperform the additional steps of: updating the selected speech model suchthat the system adapts to the speech pattern of the user.
 23. The systemof claim 18, wherein the speech recognition code causes the processor toperform the additional steps of: prompting the user to provide anothervoice input when the voice input is not recognized.
 24. The system ofclaim 18, further including: an audio output device coupled to theprocessor, the output device providing voice feedback to the user. 25.The system of claim 24, wherein the audio output device is a speaker.26. The system of claim 18, wherein the audio input device is amicrophone.